************************  TIMING AND TESTING ATLAS  ***************************

The ATLAS distribution has several different testing and timing methods.  For
testing, the most important testers are the standard API testers for the
C and Fortran77 BLAS libraries, and the Fortran77 lapack tester.  Sections
1, 2, and 3 deal with performing these tests.

ATLAS also provides its own timer programs that do some rudimentary testing
as well as performing relatively sophisticated timings (involving cache
flushing, etc).  The remaining sections deal with using these timer/testers.

1. THE FORTRAN77 INTERFACE BLAS TESTERS

   The official BLAS testers for the Fortran77 interface to the legacy BLAS
   can be ran in BLDdir/interfaces/blas/F77/testing/.  Typing "make" with
   no arguments will compile all of the testers (all levels & precisions).
   The user may then run the testers by:

   ./xsblat1
   ./xdblat1
   ./xcblat1
   ./xzblat1

   ./xsblat2 < ../sblat2.dat
   ./xdblat2 < ../dblat2.dat
   ./xcblat2 < ../cblat2.dat
   ./xzblat2 < ../zblat2.dat

   ./xsblat3 < ../sblat3.dat
   ./xdblat3 < ../dblat3.dat
   ./xcblat3 < ../cblat3.dat
   ./xzblat3 < ../zblat3.dat

   The user may edit the input files to perform more or less comprehensive
   tests. For more information on the legacy BLAS testers, go to :

   www.netlib.org/blas/faq.html

2. THE ANSI/ISO C INTERFACE BLAS TESTERS

   The official BLAS testers for the ANSI/ISO C interface to the legacy BLAS
   can be ran in BLDdir/interfaces/blas/C/testing/.  Typing "make" with
   no arguments will compile all of the testers (all levels & precisions).
   The user may then run the testers by:

   ./xscblat1
   ./xdcblat1
   ./xccblat1
   ./xzcblat1

   ./xscblat2 < ../c_sblat2.dat
   ./xdcblat2 < ../c_dblat2.dat
   ./xccblat2 < ../c_cblat2.dat
   ./xzcblat2 < ../c_zblat2.dat

   ./xscblat3 < ../c_sblat3.dat
   ./xdcblat3 < ../c_dblat3.dat
   ./xccblat3 < ../c_cblat3.dat
   ./xzcblat3 < ../c_zblat3.dat

   The user may edit the input files to perform more or less comprehensive
   tests. For more information on the legacy BLAS testers, go to :

   www.netlib.org/blas/faq.html

3. TESTING THE FORTRAN77 INTERFACE TO LAPACK

   Because ATLAS does not provide a full replacement for LAPACK, the user must
   compile a full lapack library combining ATLAS and LAPACK from netlib, as
   discussed in ATLAS/doc/LAPACK.txt.  The user can then modify the LAPACK
   makefile to point at this mixed library, and run the lapack testers as
   described in the lapack documentation.

4. USING ATLAS BLAS TIMER/TESTERS WITH A SYSTEM BLAS LIBRARY

   If your system already has a BLAS library installed or you have your
   own BLAS library (for instance, a library built using the Fortran77
   reference BLAS from netlib), then you can build the ATLAS Level 1-3
   TIMER/TESTER programs with it. These programs compute the Mflop/s
   rate for each routine called. In addition, they check the result
   matrices computed by calls to the system BLAS and ATLAS library
   routines.  For more information about the testing implementation in
   the Level 3 programs, read section 6.1.

   To properly build the programs with your BLAS library, make sure to
   set the BLASlib variable in the BLDdir/Make.inc include file correctly:

   BLASlib = /path/to/library/libblas.a

   On some machines, the compiler will recognize certain flags that link
   in the vendor-optimized BLAS library. You can place these in the BLASlib
   variable as well.  There are too many of these to list in detail here, but
   here are a few examples of vendor-supplied BLAS:

   BLASlib = -xlic_lib=sunperf    # on sun machines using Sun workshop compiler

   BLASlib = -ldxml               # Using Dec/Compaq's compiler
   BLASlib = -lcxml               # Using Compaq/Dec's compiler

   BLASlib = -lessl               # IBM machines using IBM compiler
   BLASlib = -lesslp2             # IBM Power2 machines using IBM compiler
   BLASlib = -lesslp3             # IBM Power3 machines using IBM compiler

   BLASlib = -lblas               # IRIX using SGI's compiler

   After you're sure that the BLASlib variable is set properly, read section
   3 and 4 on the ATLAS LEVEL 3 TIMER/TESTER PROGRAMS to learn how to build
   and run them.

5. TESTING _WITHOUT_ A BLAS LIBRARY

   You may still build and run the ATLAS TESTER/TIMERs programs without a
   system BLAS library by testing against the ATLAS provided C reference BLAS.
   Just leave the BLASlib variable in the ATLAS/Make.<arch> makefile blank:

   BLASlib =

   Then, edit ATLAS/bin/l3blastst.c, and change line 87 from:
#define USE_F77_BLAS
   to:
#define USE_L3_REFERENCE

   Edit ATLAS/bin/l2blastst.c and change line 56 from:
#define USE_F77_BLAS
   to:
#define USE_L2_REFERENCE

6. THE ATLAS LEVEL 3 TIMER/TESTER PROGRAMS

   To make the single, double, single complex, and double complex
   programs, type:

   make xsl3blastst
   make xdl3blastst
   make xcl3blastst
   make xzl3blastst

   Running the programs without arguments will time _GEMM with square
   problem sizes from 100 to 1000 by 100,  alpha=1.0 and beta=1.0, and A
   and B are non-transpose:

   ./xdl3blastst

DGEMM
TEST  TA  TB    M    N    K  alpha   beta    Time  Mflop  SpUp  PASS
====  ==  ==  ===  ===  ===  =====  =====  ======  =====  ====  ====

   1   N   N  100  100  100    1.0    0.0    0.02  200.0  1.00   ---
   1   N   N  100  100  100    1.0    0.0    0.01  200.0  1.00   YES
   2   N   N  200  200  200    1.0    0.0    0.09  177.8  1.00   ---
   2   N   N  200  200  200    1.0    0.0    0.09  177.8  1.00   YES
   3   N   N  300  300  300    1.0    0.0    0.35  154.3  1.00   ---
   3   N   N  300  300  300    1.0    0.0    0.29  186.2  1.21   YES
   4   N   N  400  400  400    1.0    0.0    0.73  175.3  1.00   ---
   4   N   N  400  400  400    1.0    0.0    0.68  188.2  1.07   YES
   5   N   N  500  500  500    1.0    0.0    1.48  168.9  1.00   ---
   5   N   N  500  500  500    1.0    0.0    1.35  185.2  1.10   YES
   6   N   N  600  600  600    1.0    0.0    2.47  174.9  1.00   ---
   6   N   N  600  600  600    1.0    0.0    2.30  187.8  1.07   YES
   7   N   N  700  700  700    1.0    0.0    4.01  171.1  1.00   ---
   7   N   N  700  700  700    1.0    0.0    3.65  187.9  1.10   YES
   8   N   N  800  800  800    1.0    0.0    5.74  178.4  1.00   ---
   8   N   N  800  800  800    1.0    0.0    5.43  188.6  1.06   YES
   9   N   N  900  900  900    1.0    0.0    8.38  174.0  1.00   ---
   9   N   N  900  900  900    1.0    0.0    7.68  189.8  1.09   YES
  10   N   N 1000 1000 1000    1.0    0.0   11.25  177.8  1.00   ---
  10   N   N 1000 1000 1000    1.0    0.0   10.58  189.0  1.06   YES

NTEST=10, NUMBER PASSED=10, NUMBER FAILURES=0


   Notice that there are two entries for each run. The first entry
   corresponds to a call to the library that you supply, and the second
   entry corresponds to a call to the ATLAS library.

   An explanation of each argument follows:

   ./xd3blastst -help
   USAGE: ./xd3blastst -R <rout> -Side <nsides> L/R -Uplo <nuplo> L/U
   -Atrans <ntrans> n/t/c -Btrans <ntrans> n/t/c -Diag <ndiags> N/U
   -M <m1> <mN> <minc> -N <n1> <nN> <ninc> -K <k1> <kN> <kinc>
   -n <n> -m <m> -k <k> -a <nalphas> <alpha1> ... <alphaN>
   -b <nbetas> <beta1> ... <betaN> -Test <0/1>

   -R <rout>    Specifies the routines which you would like to
                test/time. The routines for the single and double
                precision programs are gemm, symm, syrk, syr2k, trmm,
                and trsm (note the omission of the prefix s and d). The
                additional routines for the single complex and double
                complex programs are hemm, herk, and her2k. You can
                also specify the argument like this:

                ./xd3blastst -R all

                which will time all the routines. Or you can specify
                some of the routines like this:

                ./xd3blastst -R 1 symm
                ./xd3blastst -R 4 syrk trsm symm gemm

                but NOT like this:

                ./xd3blastst -R 2 syr2k all

   -Side <nsides> L/R
                Specifies the number of Side parameters you would like
                to test for the appropriate routines. If a routine does
                not take the side parameter, then the argument is ignored.
                You can specify the argument like this:

                ./xd3blastst -R symm -Side 1 L
                ./xd3blastst -R symm -Side 2 L R
                ./xd3blastst -R symm -Side 3 R R L

                The <nsides> argument is not optional; it must be present.

   -Uplo <nuplo> L/U
                Specifies the number of Uplo parameters you would like to
                test. It's use follows the same behavior as -Side, like this:

                ./xd3blastst -R 2 syrk syr2k -Uplo 1 U
                ./xd3blastst -R 2 syrk syr2k -Uplo 2 U L
                ./xd3blastst -R 2 syrk syr2k -Uplo 4 U U U U

   -Diag <ndiag> N/U
                Specifies the number of Diag parameters you would like to
                test. It's use follows the same behavior as -Side, like this:

                ./xd3blastst -R trmm -Diag 1 N
                ./xd3blastst -R trmm -Diag 2 U N
                ./xd3blastst -R trmm -Diag 4 U N U U

   -Btrans <ntrans> N/T/C
                Specifies the number of Btrans parameters you would like to
                test (only used with gemm). It's use follows the same
                behavior as -Side, like this:

                ./xd3blastst -R gemm -Btrans 1 N
                ./xd3blastst -R gemm -Btrans 2 T N
                ./xd3blastst -R gemm -Btrans 4 T N T T

   -Atrans <ntrans> N/T/C
                Specifies the number of Atrans parameters you would like to
                test. It's use follows the same behavior as -Side, like this:

                ./xd3blastst -R gemm -Atrans 1 N
                ./xd3blastst -R gemm -Atrans 2 T N
                ./xd3blastst -R gemm -Atrans 4 T N N T

                Also, use -Atrans for routines which only take one TRANS
                argument:

                ./xd3blastst -R trmm -Atrans 2 T N

   -M <m1> <mN> <mInc>
   -N <n1> <nM> <nInc>
   -K <k1> <kK> <kInc>
                Specifies the combination of problem sizes to run.
                To specify square problem sizes, use -N:

                ./xd3blastst -R gemm -N 1 10 1

                will time all square matrices from dimension 1 to 10.

                ./xd3blastst -R gemm -M 10 100 10 -N 10 100 10 -K 10 100 10

                will time every single problem size imaginable between
                10 and 100 incrementing by 10.

   -m <m>
   -n <n>
   -k <k>
                Fixes the dimension in question to one value:

                ./xd3blastst -R gemm -K 1 100 1 -m 100 -n 100

   -a <nalphas> <alpha1> ... <alphan>
   -b <nbetas> <beta1> ... <betan>
                Specifies the number and the value of alphas/betas to try.

                ./xd3blastst -R gemm -a 4 -1.0 0.0 1.0 2.0 -b 1 0.0

                For the complex precision programs, you must specify both
                the real and imaginary parts for alpha and beta.

                ./xz3blastst -R gemm -a 2 -1.0 0.0 1.0 0.0 -b 1 0.0 0.0

               For those complex routines that take a real scalar
               alpha/beta instead of a complex scalar alpha/beta, the
               imaginary part must still be specified, but is
               ignored.

                ./xz3blastst -R her2k -a 1 2.0 3.0

                will time her2k with alpha=2.0.

   -Test 0/1
                Specifies whether or not to test the results of each run.
                A brief explanation of testing is provided below.

6.1 TESTING IMPLEMENTATION

   The LEVEL 3 TESTER/TIMER programs were created to make performance
   analysis easier, not as a validation tool, thus the testing
   implementation is modest. For a complete test of ATLAS's LEVEL 3
   BLAS implementation, run the CBLAS TESTER described in section 5.

   For all routines, except _TRSM, we compute:


                          ||C-D||
      x = -----------------------------------------
         ||A|| * ||B|| * |alpha| * eps * max(M,N,K)

   where A, B, and alpha are arguments to the routine, C is the result
   matrix from the call to a trusted BLAS library, D is the result matrix from
   the call to ATLAS, eps is the epsilon value for the machine, and
   max(M,N,K) is the largest value of M, N, K which describe the
   dimensions for the argument and result matrices to the routine. The
   operation ||N|| is the column norm of matrix N, and x <= O(1).

   For _TRSM, we compute:

                           ||B-A*X||
       x =  ----------------------------------------
            ||A|| * ||X|| * |alpha| * eps * max(M,N)

   where A, B, and alpha are arguments to the routine, X is the result
   matrix from the ATLAS _TRSM call, and max(M,N) is the larger
   value of M an K.

   The data for the argument matrices are generated internally, using the
   ANSI C rand() function, and are distributed over the interval (-.5,+.5).
   In any case, if x > 1 then an error will be output:

   DGEMM
   TEST  TA  TB    M    N    K  alpha   beta    Time  Mflop  SpUp  PASS
   ====  ==  ==  ===  ===  ===  =====  =====  ======  =====  ====  ====

      1   N   N  100  100  100    1.0    0.0    0.01  259.7  1.00   ---
   ERROR: ferr is 4860974538.606986
      1   N   N  100  100  100    1.0    0.0    0.01  227.9  0.88    NO
      2   N   N  200  200  200    1.0    0.0    0.05  291.6  1.00   ---
   ERROR: ferr is 8411267408.031064
      2   N   N  200  200  200    1.0    0.0    0.06  274.5  0.94    NO
      3   N   N  300  300  300    1.0    0.0    0.17  327.2  1.00   ---
   ERROR: ferr is 2895940442.476244
      3   N   N  300  300  300    1.0    0.0    0.20  272.5  0.83    NO

   Ferr is the value of x.

   What can we infer from the error?  Not much. If the two result
   matrices are 'roughly the same', then no error is
   produced. Otherwise, the result matrices are 'not roughly the same'.

   However, if you see this error message it's best to test both
   libraries (if ATLAS doesn't fail, test your ``trusted'' BLAS)
   with the BLAS testers from netlib:

   www.netlib.org/blas/sblat3
   www.netlib.org/blas/dblat3
   www.netlib.org/blas/cblat3
   www.netlib.org/blas/zblat3

7. Timing the Level 2 BLAS

   The level 2 timer/tester is very similar in action to the level 3 timer.
   to make, in BLDdir/bin/, type:

   make xsl2blastst
   make xdl2blastst
   make xcl2blastst
   make xzl2blastst

   The flags are very similar to those accepted by the level 3 BLAS timer.
   For usage help, type
   ./xdl2blastst -h

8. Timing the Level 1 BLAS
   The level 1 timer/tester is very similar in action to the level 2 timer.
   to make, in BLDdir/bin/, type:

   make xsl1blastst
   make xdl1blastst
   make xcl1blastst
   make xzl1blastst

   The flags are very similar to those accepted by the level 2 BLAS timer.
   For usage help, type
   ./xdl1blastst -h

9. Timing ATLAS LU and Cholesky

   The LU and Choleksy timers may be built in ATLAS/bin/<arch> by:

   make xslutst
   make xdlutst
   make xclutst
   make xzlutst

   make xsllttst
   make xdllttst
   make xcllttst
   make xzllttst

   These timers time ATLAS's LU and Cholesky.  If you wish to time LAPACK or
   some other library's LU and Cholesky for comparison purposes, set your
   Make.inc macro FLAPACKlib to point to the appropriate library, and then

   make xslutstF
   make xdlutstF
   make xclutstF
   make xzlutstF

   make xsllttstF
   make xdllttstF
   make xcllttstF
   make xzllttstF

   Both LU and Cholesky testers will run default cases between 100 and 1000
   if no arguments are supplied.  Both will supply terse usage information
   if the -h flag is thrown.  These testers are similar to the level 3 tester
   in the flags they accept (i.e., -m, -M, -n -N, etc. all work the same).  In
   addition, the user may pass:
   -O <norders> <order1>...<orderN> :
      Whether Row-Major or Column-major storage LU/LLt is to be tested
      (i.e., R and C are the only legal values for orderX).  Note that
      non-ATLAS implementations (such as provided by x<pre>lutstF) can only
      test Column-major arrays (the default).
   -T <thresh> :
      supply a floating point threshhold the residual must pass.  If set to
      negative, no testing is done (saving time and space).  If set to zero,
      all tests will be flagged as failed.

10. Other timers/testers, including threading.
   ATLAS provides other timer/testers.  In particular, note that the timers
   in the bin directory have versions to test the threaded interface.  To
   build these, one simply adds the "_pt" suffix to the timer/tester name
   (eg., "make xdlutst_pt" rather than "make xdlutst").  Many of these
   timers also have a "_dyn" suffix, which allows you to test against
   the dynamically-linked ATLAS libs, assuming you have build them.
   In addition to the lu and llt tests mentioned above, we also have
   an inversion tester ("make xdinvtst"), an U*U' tester ("make xduumtst").
   and a solver tester ("make xdslvtst").  These work similarly to the
   LU and LLt testers covered above.  The solve tester allows for testing
   LU, Cholesky, and for some cases, QR solves.
